Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation of Kafka 0.11 Records #973

Merged
merged 17 commits into from
Oct 31, 2017
Merged

Add implementation of Kafka 0.11 Records #973

merged 17 commits into from
Oct 31, 2017

Conversation

wladh
Copy link
Contributor

@wladh wladh commented Oct 26, 2017

This PR introduces implementation of the new Record and RecordBatch formats from Kafka 0.11. Also it introduces an union type to deal with request/responses that can contain either records or messages.
Issue #901

Kafka supports nullable arrays, and their null value is represented by
legnth of -1.
Kafka 0.11 introduces a new Record format that replaces Message from the
previous versions. The new format allows for Headers which are
key-value pairs of application metadata associated with each message.
Kafka 0.11 introduced RecordBatch as a successor to MessageSet.
Using the new RecordBatch is required for transactions and idempotent
message delivery.
Many request/response structures can contain either RecordBatches or
MessageSets depending on the version of Kafka the client is talking to.
This changeset implements a sum type that makes it more convenient to
work with these structures by abstracting away the type of the records.
Copy link
Contributor

@eapache eapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly pretty straightforward, just a couple of things.

My biggest concern is around the varint length calculations. I understand the problem you're facing with a dynamic reserve length for the encoder, but I find the solution itself a bit hard to follow. It also results in prep-encoding the majority of the record-set a couple more times than is strictly needed.

For the prep-encoder, the actual reserve length doesn't matter at all; there should be no need to recurse because counting up the lengths is just addition so the order doesn't matter.

For the real-encoder, the prep-encoder has already run so you should be able to know the appropriate reserve length at that point?

@@ -79,7 +79,7 @@ func (rd *realDecoder) getArrayLength() (int, error) {
rd.off = len(rd.raw)
return -1, ErrInsufficientData
}
tmp := int(binary.BigEndian.Uint32(rd.raw[rd.off:]))
tmp := int(int32(binary.BigEndian.Uint32(rd.raw[rd.off:])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cast to int32 and then int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to convert it to a signed value, so we're converting it to its signed counterpart (int usually 64 bits on 64bit platforms, so the sign conversion won't happen).

record.go Outdated
controlMask = 0x20
)

type Header struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this RecordHeader to be a bit clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

return 0, fmt.Errorf("unknown records type: %v", r.recordsType)
}

func (r *Records) isPartial() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method (and isControl below) don't have any coverage AFAICT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some coverage for them.

@wladh
Copy link
Contributor Author

wladh commented Oct 26, 2017

Indeed the uncompressed records will be prep-encoded twice. But I didn't find a clean solution (that doesn't involve checking in encode() whether the encoder is real or prep).
I could make varintLengthField also a pushEncoder and keep it around in Record between the encoders runs.

@eapache
Copy link
Contributor

eapache commented Oct 26, 2017

I could make varintLengthField also a pushEncoder and keep it around in Record between the encoders runs.

I think that sounds like it might be simpler? If pushEncoder.run could also return additional length then I think this becomes very nice?

@wladh
Copy link
Contributor Author

wladh commented Oct 27, 2017

Actually looking again at the code that wouldn't work without a few more changes. pushEncoder.run is not called by prepEncoder.pop because the run methods are actually writing the field. Maybe we can define a pushEncoder.prepRun method that will be called by prepEncoder.pop but it's not very appealing to me because we're re-introducing the knowledge of encoding phase into the encoders that shouldn't care about it. That was one thing I tried to avoid in my solution (and it should work even if we do away with prepEncoder for instance and use vectored I/O or builders).

@eapache
Copy link
Contributor

eapache commented Oct 27, 2017

Hmm, ya. What if reserveLength() was allowed to return a special const dynamicLength = -1? Then the prepEncoder could just call it again in pop if it was flagged as dynamic during the push? Or even a separate isDynamic() bool method on the interface or something...

Added dynamicPushEncoder interface that extends the pushEncoder with an
adjustLength method that will be called by prepEncoder.pop()
time so that it computes the actual length of the field.
Also made varintLengthField implement this method so we can avoid a
needless run of prepEncoder for uncompressed records.
@wladh
Copy link
Contributor Author

wladh commented Oct 30, 2017

So I added this additional interface dynamicPushEncoder that defines an additional method that will be called during prepEncoder.pop() to get the actual field length. I'm not thrilled with this solution as it implies a prep phase, but at least the knowledge of it is limited to the varintLengthField.

Copy link
Contributor

@eapache eapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach better, thanks. Just some minor stuff now.

length_field.go Outdated
}

func (l *varintLengthField) run(curOffset int, buf []byte) error {
if !l.adjusted {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary, it will already fail in a pretty obvious way if this mistake is made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it, but I think the failure won't be too obvious (it will have length 0, which could happen in other scenarios as well).

length_field.go Outdated
@@ -31,22 +31,43 @@ func (l *lengthField) check(curOffset int, buf []byte) error {
type varintLengthField struct {
startOffset int
length int64
adjusted bool
size int
Copy link
Contributor

@eapache eapache Oct 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both size and length fields is going to cause confusion. Isn't size always calculable? Should we just put that logic in reserveLength() and call it as needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size could always be calculated, but it involves encoding the varint into a buffer, so not very cheap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks pretty cheap to me? The buffer doesn't escape so it will be allocated on the stack, and the actual encoding itself is just a couple of bit-ops in a very short for-loop.

pushEncoder

// Called during pop() to adjust the length of the field.
adjustLength(currOffset int) int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe call this updateLength but it doesn't matter. The description should mention that the return value is the diff though, not the new length.

Actually does it need to return anything? The caller can always just call reserveLength again, and subtract the old reserveLength before-hand if it wants the diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will complicate the caller, since in practice most of the time they will call reserveLength at push time and then again at pop time just to readjust.
The initial call to reserveLength might not return 0 (especially if we go with your suggestion of computing the size on demand). So, most of the time at the adjust time you want the diff.
I will update the comment.

record.go Outdated
Headers []*RecordHeader

length varintLengthField
totalLength int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is unused now.

record.go Outdated
return 0, err
}
}
return int(r.length.length) + r.length.size, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be simpler and more reliable to just always prep-encode and then ask the encoder. Otherwise this is going to bite people who e.g. add a header or something.

record_batch.go Outdated
case CompressionNone:
re = pe
case CompressionGZIP, CompressionLZ4, CompressionSnappy:
if err := b.computeRecordsLength(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block here is doing basically the same thing as the global encode(encoder) ([]byte, err) method, i.e. using a prep-encoder to calculate the length and then making a byte array and using a real-encoder to encode it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I can reuse that.

@wladh
Copy link
Contributor Author

wladh commented Oct 31, 2017

It seems that travis-ci failed because one broker didn't come up. Not sure how to retrigger the check.

@eapache
Copy link
Contributor

eapache commented Oct 31, 2017

I can retrigger CI as needed, I've never been able to figure out how to let contributors do that without giving them full access to everything.

@@ -50,3 +50,14 @@ type pushEncoder interface {
// of data to the saved offset, based on the data between the saved offset and curOffset.
run(curOffset int, buf []byte) error
}

// dynamicPushEncoder extends the interface of pushEncoder for uses cases where the length of the
// fields itself is unknown until its value was computed (for instance varint encoded lenght
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo s/lenght/length

record_batch.go Outdated
return pe.pop()
}

var raw []byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you move this whole block (down to line ~126) into a method, and invert the check on b.compressedRecords you can get rid of lines 77-85 which are duplicates of 127-135.

@wladh
Copy link
Contributor Author

wladh commented Oct 31, 2017

Maybe I should add a dynamicPushDecoder that will call a decodeField method during push and skip reserveLen.

dynamicPushDecoder extends pushDecoder for cases when the field has
variable length.
Also, changed varintLengthField to make use of the new interface/
@eapache
Copy link
Contributor

eapache commented Oct 31, 2017

This is really nice, thanks!

@eapache eapache merged commit eca6c1c into IBM:master Oct 31, 2017
@wladh wladh deleted the records branch October 31, 2017 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants